Statistical packing of resource requirements in data centers

ABSTRACT

A computer-implemented method of managing resources in a virtual machine environment can include determining a specification of provisioning success corresponding to each of a plurality of jobs in the virtual machine environment, forming a prioritized listing of the plurality of jobs and, responsive to the specification of provisioning success and the prioritized listing, providing a resource specification for each of the plurality of jobs. The providing can include determining a first prediction of resource needs corresponding to each of a first subset of the plurality of jobs and determining a second prediction of resource needs corresponding to a second subset of the plurality of jobs.

CROSS-REFERENCE TO RELATED APPLICATION

This application is a divisional of U.S. patent application Ser. No.12/253,111, titled “STATISTICAL PACKING OF RESOURCE REQUIREMENTS IN DATACENTERS,” which was filed on Oct. 16, 2008 and issued on Feb. 18, 2014as U.S. Pat. No. 8,656,404, the content of which is hereby fullyincorporated by reference herein.

TECHNICAL FIELD

The disclosed technology relates to the field of virtual machines (VMs)in data centers and, more particularly, to various techniques pertainingto the statistical packing of virtual machine resource requirements indata centers.

BACKGROUND

Data centers are frequently used by various types of entities for a widevariety of purposes. Service providers such as phone companies, cablenetworks, power companies, retailers, etc., commonly store and accesstheir customers' data in ‘server farms,’ or data centers. For purposesof the present specification, ‘data center’ refers to a facility used tohouse computer systems and associated components, such astelecommunications and storage systems. A data center generally includesnot only the computer systems, but also back-up power supplies,redundant data communications connections, environmental controls suchas air conditioning and fire suppression, security systems and devices,etc.

Data center operations generally revolve around customer service levels.For example, a particular customer may desire to have a defined qualityof service for that customer's computations or data communications. Thequality of service may have different requirements for differentcustomers. For example, for one customer, the key measure of the qualityof service may involve how fast an application responds when accessedremotely. For another customer, the quality of service may involve thespeed or bandwidth of connections provided to that customer'ssubscriber.

A data center may commit to provide a particular service level for agiven customer in the form of a formally negotiated service levelagreement (SLA). An SLA typically specifies levels of availability,serviceability, performance, operation, billing, etc., and may evenspecify penalties in the event of violations of the SLA. SLAs commonlyaddress performance measurement, problem management, customer duties,warranties, disaster recovery, and termination of agreement. Forexample, an SLA may demand that a particular job get a certain amount ofresources with a specified probability. The SLA may also specify a limiton the amount of resources to be assigned to a certain job or group ofjobs.

‘Virtualization’ generally refers to a technique for hiding physicalcharacteristics of computing resources from the way in which othersystems, applications, or end users interact with those resources. Thistypically includes making a single physical resource (e.g., a server,operating system, application, storage device, etc.) appear to functionas multiple logical resources. Virtualization may also include makingmultiple physical resources appear as a single logical resource. Inaddition, it may include making one physical resource appear, withsomewhat different characteristics, as one logical resource.

VMWare, Inc., is an example of a publicly-listed company that offersvirtualization software products, such as VMWare's ESX Server.

Virtualization can essentially let one computer do the job of multiplecomputers, by sharing the resources of a single computer across multipleenvironments. Virtual machines (e.g., virtual servers and virtualdesktops) can provide users with the ability to host multiple operatingsystems and multiple applications both locally and in remote locations,freeing users from physical and geographical limitations. In addition toenergy savings and lower capital expenses due to more efficient use ofhardware resources, users can get a high availability of resources,better desktop management, increased security, and improved disasterrecovery processes.

Virtual machines serve a wide variety of purposes in a given computersystem. For example, virtual machines may be used to provide multipleusers with simultaneous access to the computer system. Each user mayexecute applications in a different virtual machine, and the virtualmachines may be scheduled for execution on the computer system hardware.Virtual machines may be used to consolidate tasks that were previouslyrunning on separate computer systems, for example, by assigning eachtask to a virtual machine and running the virtual machines on fewercomputer systems. Virtual machines may also be used to provide increasedavailability. If the computer system fails, for example, tasks that wereexecuting in virtual machines on the computer system may be transferredto similar virtual machines on another computer system.

Using virtual servers enables the migration of processing tasks to otherphysical servers or resources transparently to the consumers of theservices provided by the virtual server, where the consumer may be auser, a process, another computer, etc. A ‘consumer’ is typically anyentity that uses a process or service within the power control system.This is contrasted with a ‘customer’ which is an identified entity towhich the data center provides services according to a service levelagreement. Performance levels are generally tracked by customers.

A virtual server differs greatly from a physical server. A virtualserver typically appears to be a single server to entities accessing it,while it may actually be a partition or subset of a physical server. Itmay also appear as a single server but actually be comprised of severalphysical servers. A virtual server is created through a virtualizationprocess, as discussed above.

Thus, in a given data center, virtualization allows multiple virtualmachines (e.g., virtual servers) to share the physical resources (e.g.,CPU, memory, disk, and networking resources) of the same physicalmachine(s) in the data center. Each virtual machine typically has acorresponding specification of resource requirements that determines howmuch of the physical resources should be reserved for the given virtualmachine.

However, a typical specification of resource requirements for a virtualmachine undesirably overbooks or reserves more physical resources thanare actually needed most of the time by the virtual machine, whichresults in the unnecessary wasting of physical resources. Thus, thereexists a need for greater reductions in cost and power consumption byvirtual machines in data centers.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows an exemplary embodiment of a data center.

FIG. 2 is a flowchart of a first exemplary computer-implemented methodof managing resources in a virtual machine environment.

FIG. 3 is a flowchart of a second exemplary computer-implemented methodof managing resources in a virtual machine environment.

FIG. 4 is a flowchart of a third exemplary computer-implemented methodof managing resources in a virtual machine environment.

DETAILED DESCRIPTION OF THE EMBODIMENTS

FIG. 1 shows an exemplary architecture 10 for a data center. In thisembodiment, the data center includes multiple physical devices 14 (e.g.,servers). A physical device 14 is an actual machine, such as a quad-,dual- or single-core computing system that provides a particularservice. Examples include communications servers, database servers,applications servers, etc.

As such, each physical device 14 is depicted as being having at leastone virtual machine 17 (e.g., virtual server) operating on it. A virtualmachine 17 may include an application running on top of an operatingsystem, for example. This discussion is provided merely fordemonstrative purposes and no limitation of location or logicalhierarchy is intended, nor should one be implied.

In the example, the virtual machine 17 allows a low-level module 18(e.g., a service/power controller) to task the physical devices 14 withprocessing tasks in virtual machines based in part on the resource needsof the virtual machines and the resource of the physical devices 14. Thelow-level module 18 may be referred to as a controller or scheduler. Thecontroller 18 can schedule the processing of virtual machines, or thecontroller 18 can schedule individual tasks to be performed withinvirtual machines. As used herein, the term “job” generally refers to thevirtual machine or task being scheduled.

In the example, the controller 18 is shown as being a single controller,while it may actually be distributed across several computers,processing cores, etc. The controller 18 can migrate jobs betweenphysical machines and adjust the power consumption of physical machines.In addition to the central controller 18, one or more of the individualphysical devices 14 may have local controllers such as 16. Further,while the only devices depicted in the example are servers, other typesof devices may be included, such as power supplies, storage arrays orother types of storage, tape decks, etc.

The centralized controller 18 may be coupled to data lines 20. Thefunctions of the data center generally revolves around data processingof some sort, and the controller may merely exist in the same powerdistribution structure as the data lines, or the power controller maymonitor or affect the operation of the data lines.

Similarly, the power controller may merely exist in the same powerstructure as the power lines 22, or the controller 18 may take a moreactive role with the power lines 22. The power lines 22 come in from‘the grid,’ or the local power infrastructure that generally includestransmission lines, converters, transformers, power switches, etc.

In certain embodiments, the resources requirements for each virtualmachine in a given virtual machine environment can be predictedstatistically and potentially described by a histogram. Predictions ofresource requirements can be performed by various techniques, such as“time-of-day conditioning” (i.e., conditioning based on the time of daycertain resources are used) and Markov modeling (i.e., reaching futurestates through a probabilistic rather than deterministic process).However, one of skill in the art will recognize that any of a number ofknown modeling techniques can be used to generate such histograms.

“Time-of-day conditioning” and Markov modeling are typically based onobservations of historic resource usage data. Each technique is capableof capturing resource needs that may be extreme at certain times of dayand extreme for brief periods of time, and that may occur atunpredictable times. Such behavior typically results in the wasting ofresources, because resource requirements are usually set unnecessarilyhigh all of the time.

Predicted resource needs may take into account various types ofinformation, such as resource usage based on time of day, day of week,what amount of resources a given job needed in a preceding time interval(e.g., within the last ten minutes), and any other information that canbe used to improve the predictability of resource usage by the job.

In certain embodiments, the predicted resource needs are provided in theform of histograms, which allow for considerable flexibility in the kindof modeling used to predict resource requirements. Prediction andpacking can be performed at different frequencies. As used herein,“packing” (or “packing together”) generally refers to a techniqueintended to reduce a total amount of resources required by a group ofjobs in a virtual machine environment by considering the resource needsof the group of jobs as a whole as well as the resource needs of each ofthe jobs individually.

Since virtual machine resource needs are typically well predicted overshort periods of time, re-computing of the packing and adjustingresource requirements every 5-10 minutes can be very beneficial. Thetechniques described herein, however, can desirably be used to pack at avariety of frequencies.

Convolving histograms provides a distribution of the sum of the resourcerequirements of several virtual machines, assuming that their resourceneeds are statistically independent. When virtual machines aredependent, there are several options. For example, the virtual machinescan be scheduled together in a pool, with their total resource needsrepresented as a single histogram that captures their correlatedbehavior. This pool can then participate in the packing techniquesdescribed herein. Dependent virtual machines can also be segregated intoseveral different pools, where the members of the pool are not highlycorrelated. In these instances, the packing techniques described hereinare applied separately to each pool.

The packing techniques can include using a specification of requiredprovisioning success as input. In an exemplary embodiment of thedisclosed technology, it is assumed that a provided service levelagreement (SLA) for a customer requires a certain probability p thatresource needs are met for a given virtual machine. The probability pcan be specified directly in the SLA or derived from other informationin the SLA. For example, an SLA may specify that resource needs for avirtual machine are to be met all the time (with a penalty for failingto do so), in which situation the probability p would typically beassigned a small value so that failure (and penalties) would beinfrequent, at most. For SLAs that specify a less well-defined resourcerequirement, a value for p can be determined that might be higher butwould still meet a customer's expectations. Additionally, SLAs mayinclude caps on certain criteria such as maximum resource needs.

Embodiments of the disclosed technology can take as input various typesof parameters. In exemplary embodiments, input parameters include aprediction of resource needs for at least one virtual machine (e.g.,presented as a histogram), and a required provisioning success (e.g., aprobability p corresponding to a maximum rate of failure required toprovide full resource needs).

FIG. 2 is a flowchart of an exemplary computer-implemented method 200 ofmanaging resources in a virtual machine environment.

At 202, a required provisioning success is determined for each ofmultiple jobs to be scheduled in a given virtual machine environment.For example, the required provisioning success can be derived from acorresponding service level agreement (SLA), which can explicitly orimplicitly provide information pertaining to the probabilityrequirement.

At 204, a prediction of resource needs is determined for each of thejobs. The prediction can indicate an amount of resources needed by aparticular job for it to properly execute, for example. Determining theprediction 204 can include using one or more techniques such as time ofday conditioning and Markov modeling.

Steps 202 and 204 can be performed at different times or they can beperformed at least partially or fully concurrently with each other.

At 206, a resource specification for each job can be determined based onthe required provisioning success determined at 202 and the predictiondetermined at 204. The resource specification can be in a form suitablefor virtual machine scheduling systems. The resource specification can,for example, include minimum and maximum resource requirements for thevirtual machines. The resource specification can thus result in areduction of the total resources reserved by the virtual machines in acluster.

At 208, the resource specification determined for each job can beprovided to a lower-level scheduling module. For example, thelower-level scheduling module can perform various types ofscheduling-related operations with respect to the given jobs, such asscheduling jobs that have not been scheduled yet, consolidating jobs onfewer physical servers, and adjusting the schedule for jobs on the samephysical server.

In certain embodiments, the step of determining a prediction of resourceneeds 204 can be repeated (e.g., at 210). Responsive to the repeateddetermination of the prediction of resource needs, as well as thepreviously determined required provisioning success, the previouslydetermined resource specification can be adjusted. For example, thetechniques used at 206 can be re-applied here.

In certain embodiments, an optimization algorithm (e.g., a greedypacking algorithm) can be used to find a packing that, while desirable,may not represent the best-case scenario (e.g., maximum packing). Suchembodiments are generally preferable in situations where approximationspeed is prioritized higher than maximum packing ability. In otherwords, these embodiments provide fast approximation that still achievessome or most of the benefit of statistical packing while potentiallyerring in reserving more resources than are actually required. Sucherring, however, is typically only very slight in inconsequential,particularly in light of the advantageous packing.

In certain embodiments, an action list can be formed. In the actionlist, jobs having the most severe requirements can be placed at thebeginning. The action list can then be processed in order, and aresource specification can be chosen for each virtual machine. Forexample, a virtual machine can be given a resource specificationconsisting of a minimum resource reservation that will insure that thevirtual machine will have all the resources it needs with a failureprobability less than p.

While processing the action list, the algorithm may discover that it hasalready made enough individual minimum resource reservations such thatit can ensure that the total allocation for a resource pool (e.g., anamount of resources intended for use by a group of jobs rather than asingle job) is large enough that the combined requirements of thevirtual machines sharing the pool have a failure probability that is notgreater than the failure probability p for the individual virtualmachine being processed, in which case no individual reservation mayneed to be made for the virtual machine being processed or forsubsequent virtual machines on the action list.

In this way, the algorithm can allocate a large amount of resources topremium jobs until the total allocation reaches a level that issufficient to satisfy a low failure probability for the pool, forexample. Additional jobs can be deemed to require no separate allocationbecause they effectively share the reservation of the premium jobs.

FIG. 3 is a flowchart of an exemplary computer-implemented method 300 ofmanaging resources for virtual machines in a virtual machineenvironment.

At 302, a required provisioning success corresponding to each of severaljobs in the virtual machine environment can be determined. For example,the required provisioning success can be derived from a correspondingservice level agreement (SLA).

At 304, a predicted resource need corresponding to each of the jobs canbe determined. For example, the predicted resource need can be based ona-priori information given by the user in configuring the job, and canalso be based on historical data from previous processing of the job.

At 306, a prioritized listing of the jobs can be formed. For example,the jobs can be ranked according to a level of importance assigned toeach of them. The level of importance can be determined for each jobbased at least in part on the required provisioning success determinedat 302 as well as other pertinent information provided by the customer(e.g., in an SLA) such as a severity level, for example.

Once the prioritized listing has been created, a resource specificationcan be assigned to each of the jobs based on the prioritized listing,the job's previously determined required provisioning success, and thejob's predicted resource need. At 308, the jobs are processed in order(e.g., according to level of importance) to determine an individualresource specification based on each job's individual needs as well asany resources that have been specified for the group of higher priorityjobs earlier in the list and will be available to the job if they arenot fully utilized by the higher priority group.

The following exemplary procedure (“ReserveIndividual”) describes how areservation can be determined for a single virtual machine. In theexample, the input includes the virtual machine's predicted resourceneed described by a histogram (“hist”) (e.g., an array of frequencies ofresource needs where hist[i] is the probability that the virtual machinewill need a resource amount between i*histStep and (i+1)*histStep). Theinput also includes a required provisioning success that is specifiedwith an allowed failure probability (“prob”):

procedure ReserveIndividual(hist[.], prob) acc <− 0; i <− length ofhist[.]; while acc < prob do acc <− acc + hist[i]; i <− i−1; return(i + 1) * histStep

A smaller total reservation can be achieved if the reservations arecomputed for a group of virtual machines together, as described in thefollowing exemplary procedure (“ReserveGroup”). In the example, thehistograms for all the virtual machines are given in a two dimensionalarray (“histograms[ . , . ]”).

procedure ReserveGroup(histgrams[.,.], probs[.]) combined <− Convolveall histograms; for each individual virtual machine j do intercept <−ReserveIndividual(hist[j,.], prob[j]); otherIntercept <−ReserveIndividual(combined[.], prob[j]); actionList[j] <− {j, intercept,otherIntercept} solution[j] <− 0; sort actionList in descending order ofits third component  “otherIntercept” acc <− 0 while actionList haselements do front <− remove first element from actionList if front[3] >acc then allocation <− Min[front[2], front[3] − acc]; solution[front[1]]<− allocation; acc <− acc + allocation; else return solution

The exemplary “ReserveGroup” procedure forms an action list where thejobs with the most severe requirements are placed at the beginning ofthe list. The action list is processed in order, and an allocation ischosen for each virtual machine according to one of the following twostrategies: either the virtual machine is given its individualrequirement necessary to meet its allowed failure probability (e.g.,prob[j] from the previous procedure), or the total allocation for thepool is ensured to be large enough that the combined virtual machinerequirements will not fail with a probability greater than theprobability prob[j] for the individual virtual machine being processed.

In certain embodiments, an exact optimal solution can be determined. Inthese embodiments, a binary search can be performed to find an optimal“total” reservation for the pool. For example, the feasibility of eachpostulated value for the “total” can be tested by computing individualreservations. If the individual reservations sum to less than thepostulated “total,” then the “total” can be deemed to be feasible. Thebinary search can be used to find the smallest possible “total” that isfeasible. Individual reservations for virtual machines can be computedby considering the joint distribution of each individual virtualmachine's histogram and a combined histogram for all of the othervirtual machines in the environment.

For an exemplary postulated value of a “total” allocation, individualreservations can be computed with a function (see, e.g., the “Required”function below) that, when considering a single virtual machine in agiven environment, can consider the resource needs of all of the othervirtual machines in the environment represented by a (cumulative)convolution (see, e.g., “accumulatedOther” below) and then find thesmallest reservation i such that the probability that the virtualmachine needs more than its resource reservation (and cannot obtainextra resources from the resource pool) is less than a specified failureprobability. In certain embodiments, the reservation i represents thesubstantially smallest value such that the probability that the virtualmachine requires j (which is more than i) and that the“accumulatedOther” requires more than the “total” minus j, is less thanthe specified failure probability.

procedure BuildAccumulatedOther(histograms[.,.], j) combinedOther <−convolve all histograms except for j acc <− 0; i <− length ofcombinedOther; while i > 0 do  acc <− acc + combinedOther[i]; accumulatedOther[i] <− acc;  i <− i − 1; return accumulatedOther[.];procedure Required(accumulatedOther[.], hist[.], total, prob) accFailure<− 0, i <− length of hist[.] while accFailure < prob do if i < totalthen otherFailure <− accumulatedOther[total − i]; else otherFailure <−1.0; accFailure <− hist[i] * otherFailure; i <− i − 1; return i+ 1;

FIG. 4 is a flowchart of an exemplary computer-implemented method 400 ofmanaging resources for virtual machines in a virtual machineenvironment.

At 402, a required provisioning success corresponding to each of severaljobs in the virtual machine environment can be determined. For example,the required provisioning success can be derived from a correspondingSLA.

At 404, a predicted resource need corresponding to each of the jobs canbe determined. For example, the predicted resource need can be based ona-priori information given by the user in configuring the job, and canalso be based on historical data from previous processing of the job.

At 406, a total resource specification can be postulated. Once the totalresource specification has been postulated, individual resourcespecifications for the virtual machines can be determined based on thepostulated total, as shown at 408.

At 410, a comparison can be made between the total of the individualresource specifications (determined at 408) and the total resourcespecification (postulated at 406). If an improvement can be made, then anew total resource specification can be postulated (as shown at 412) andprocessing can return to 408. Otherwise, if there does not seem to beany indication of further improvement resulting from continuedprocessing, the process can finish, as shown at 414.

Below is an exemplary procedure (“ReserveGroupExact”) that can be usedin conjunction with certain embodiments of the disclosed technology:

procedure ReserveGroupExact(histograms[.,.], probs[.]) combined <−convolve all histograms for each VM j do accumulateOther[j] <−BuildAccumulateOther[histograms[.,.], j]; maxTotal <−ReserveIndividual[combined, min of probs]; minTotal <− 0; solution <−TestSolution[maxTotal,...]; while maxTotal > minTotal + 1 do middle <−Floor[ (maxTotal + minTotal)/2 ]; temp <− TestSolution[middle,...]; iftemp is not feasible minTotal <− middle; else maxTotal <− middle;solution <− temp;  ]; solutionTotal <− sum of solution; extra = maxTotal− solutionTotal; distribute extra evenly among solution; procedureTestSolution(total,accumulateOther[.],histograms[.,.], probs[.]]) foreach VM i do required[i] <− Required[accumulateOther[i], histograms[i],total, probs[i]]; if sum of required does not exceed total returnrequired (indicating it is feasible) else return infeasible

The techniques described herein can achieve a reduced total resourcespecification by considering the resource needs of a group of jobs whenthe resource needs of individual jobs are determined. The group of jobsoften has more predictable needs, which typically means that there isless need for excess individual resource specification. In the exampleof FIG. 3, the group is a collection of higher priority jobs that werealready specified. In the example of FIG. 4, the group is the entirecollection of jobs, whose specification is postulated. One havingordinary skill in the art will understand that a number of possiblechoices exist to reduce uncertainty and reduce the resourcespecifications.

The techniques described herein typically assume that the given jobs areindependent, although such techniques may be modified to handle avirtual environment having multiple jobs, some of which may be dependentupon other jobs in the environment.

Additionally, two or more of the techniques described above can beflexibly implemented in combination with one another. For example, insome embodiments, an approximation-type implementation can be utilizedand, if certain parameters are met (e.g., if there is still enoughprocessing time left), an exact-optimal-solution-type implementation canalso be utilized (e.g., to refine the solution).

The techniques described herein can provide an output representing aminimum reservation that can be made for each applicable virtualmachine. This minimum reservation can be a typical parameter in avirtual machine specification language (e.g., in VMWare and othervirtualization products). These reservations can be computed such thatthe virtual machines can be combined in a single pool on a cluster ofphysical machines. A scheduler (typically part of virtualizationproducts) can allocate resources and locate jobs on physical machines tomeet the minimum reservations first, before allocating excess resourceto other jobs. Exemplary embodiments of the disclosed technology canassume that minimum resources will be met first but make no assumptionabout how resources in excess of minimums are to be shared among virtualmachines.

Application of the techniques described herein can desirably allow datacenter operators and users (e.g., customers) rely on more accurate andmore compact physical resource reservations in a data center, whichprovides various advantages. For example, the freeing of physicalresources for more customers will typically result in improved businessperformance for the data center, and the reduction of the number ofrunning physical servers will reduce energy costs for data centeroperators and reduce costs for users.

The various advantageous techniques described herein may be implementedas computer-implemented methods. Additionally, they may be implementedas instructions stored on a tangible computer-readable medium that, whenexecuted, cause a computer to perform the associated methods. Examplesof tangible computer-readable media include, but are not limited, todisks (such as floppy disks, rigid magnetic disks, optical disks, etc.),drives (e.g., hard disk drives), semiconductor or solid state memory(e.g., RAM and ROM), and various other types of recordable media such asCD-ROM, DVD-ROM, and magnetic tape devices.

It will be appreciated that several of the above-disclosed and otherfeatures and functions, or alternatives thereof, may be desirablycombined into many other different systems or applications. Also thatvarious presently unforeseen or unanticipated alternatives,modifications, variations, or improvements therein may be subsequentlymade by those skilled in the art which are also intended to beencompassed by the following claims.

What is claimed is:
 1. A computer-implemented method of managingresources for at least one virtual machine in a virtual machineenvironment, comprising: determining a specification of provisioningsuccess corresponding to each of a plurality of jobs in the virtualmachine environment; forming a prioritized listing of the plurality ofjobs; and responsive to the specification of provisioning success andthe prioritized listing, providing a resource specification for each ofthe plurality of jobs, wherein the providing comprises: determining afirst prediction of resource needs corresponding to each of a firstsubset of the plurality of jobs; and determining a second prediction ofresource needs corresponding to a second subset of the plurality ofjobs.
 2. The computer-implemented method of claim 1, wherein the firstsubset of the plurality of jobs comprises jobs having a resourcespecification that meets or exceeds a specified severity level.
 3. Thecomputer-implemented method of claim 1, further comprising providing afirst resource specification corresponding to each of the first subsetof the plurality of jobs.
 4. The computer-implemented method of claim 3,further comprising providing a second resource specificationcorresponding to a resource pool corresponding to the second subset ofthe plurality of jobs.
 5. The computer-implemented method of claim 4,wherein the resource pool comprises an amount of resources remainingafter the first resource specification corresponding to each of thefirst subset of the plurality of jobs has been provided.
 6. Thecomputer-implemented method of claim 1, wherein the second prediction ofresource needs meets or exceeds a specified probability threshold.
 7. Acomputer-implemented method of managing resources for at least onevirtual machine, comprising: determining a failure probability for aplurality of jobs corresponding to the at least one virtual machine;determining a prediction of resource needs corresponding to each of theplurality of jobs; generating a total prediction of resource needsrepresenting a sum of each prediction of resource needs; and determiningwhether the total prediction of resource needs is below a specifiedthreshold.
 8. The computer-implemented method of claim 7, wherein,responsive to determining that the total prediction of resource needs isbelow the specified threshold, generating a resource specificationcorresponding to each of the plurality of jobs.
 9. Thecomputer-implemented method of claim 7, wherein, responsive todetermining that the total prediction of resource needs meets or exceedsthe specified threshold, adjusting the specified threshold.
 10. Thecomputer-implemented method of claim 9, further comprising repeatingdetermining whether the total prediction of resource needs is below theadjusted specified threshold.
 11. The computer-implemented method ofclaim 10, wherein, responsive to determining that the total predictionof resource needs is below the adjusted specified threshold, generatinga resource specification corresponding to each of the plurality of jobs.12. The computer-implemented method of claim 10, wherein, responsive todetermining that the total prediction of resource needs meets or exceedsthe adjusted specified threshold, adjusting the specified threshold. 13.The computer-implemented method of claim 7, further comprisingdetermining an initial resource test interval, wherein the specifiedthreshold is at about a halfway point within the initial resource testinterval.
 14. The computer-implemented method of claim 12, furthercomprising determining an initial resource test interval, whereinadjusting the specified threshold comprises adjusting the initialresource test interval.